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This article studies the estimation of the causal effect of a time- 
varying treatment on time-to-an-event or on some other continuously 
distributed outcome. The paper applies to the situation where treat- 
ment is repeatedly adapted to time-dependent patient characteristics. 
The treatment effect cannot be estimated by simply conditioning on 
these time-dependent patient characteristics, as they may themselves 
be indications of the treatment effect. This time-dependent confound- 
ing is common in observational studies. Robins [(f 992) Biometrika 79 
321-334, (1998b) Encyclopedia of Biostatistics 6 4372-4389] has pro- 
posed the so-called structural nested models to estimate treatment 
effects in the presence of time-dependent confounding. In this arti- 
cle we provide a conceptual framework and formalization for struc- 
tural nested models in continuous time. We show that the resulting 
estimators are consistent and asymptotically normal. Moreover, as 
conjectured in Robins [(1998b) Encyclopedia of Biostatistics 6 4372- 
4389], a test for whether treatment affects the outcome of interest 
can be performed without specifying a model for treatment effect. 
We illustrate the ideas in this article with an example. 

1. Introduction. Causality is a topic whicii nowadays receives much at- 
tention. Statisticians, epidemiologists, biostatisticians, social scientists, com- 
puter scientists [especially those in artificial intelligence, see, e.g.. Pearl 
(2000)], econometricians and philosophers are investigating questions like 
"what would have happened if" and "what would happen if." This article 
discusses estimating the effect of a time- varying treatment. As a recurring 
example, this article focuses on the effect of a medical treatment which is 
adapted to a patient's state during the course of time. 
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Large observational studies have become widely used in medical research 
when data from randomized experiments are not available. Randomized clin- 
ical trials are often expensive, impractical, and sometimes unfeasible for eth- 
ical reasons because treatment is withheld from some patients regardless of 
medical considerations. Also, in some instances, exploratory investigations 
using nonexperimental data are used before conducting a randomized trial. 
In observational studies there is no pre-specified treatment protocol. Data 
are collected on patient characteristics and treatments in the course of the 
normal interaction between patients and doctors. Obviously, it is consid- 
erably more difficult to draw correct causal conclusions from observational 
data than from a randomized experiment. The main reason is the so-called 
confounding by indication or selection bias. For example, doctors may pre- 
scribe more medication to patients who are relatively unhealthy. Thus, as- 
sociation between medication dose and health outcomes may arise not only 
from the treatment effect but also from the way the treatment was assigned. 

If this confounding by indication only takes place at the start of the treat- 
ment, one can condition on initial patient characteristics or covariates, such 
as blood pressure or number of white blood cells, in order to remove the 
effect of the confounding, and get meaningful estimates of the treatment ef- 
fect. Linear regression, logistic regression or Cox regression can be used for 
this purpose. However, estimating treatment effects is more difficult if treat- 
ment decisions after the start of the treatment are adapted to the state of 
the patients in subsequent periods. Treatment might be influenced by a pa- 
tient's state in the past, which was influenced by treatment decisions before; 
thus, simply conditioning on a patient's state in the past means disregarding 
information on the effect of past treatment. In such a case, even the well- 
known time-dependent Cox model does not answer the question of whether, 
or how, treatment affects the outcome of interest. The time-dependent Cox 
model studies the rate at which some event of interest happens (e.g., the 
patient dying), given past treatment- and covariate history. However, under 
time-dependent confounding, past covariate values may have been influenced 
by previous treatment. The net effect of treatment can thus not be derived 
from just this rate; see, for example, Robins (1998b), Keiding (1999) or Lok 
(2001). 

Structural nested models, proposed in Robins (1989), Lok, Gill, van der 
Vaart and Robins (2004) and Robins (1992, 1998b) to solve practical prob- 
lems in epidemiology and biostatistics, effectively overcome these difficulties 
and estimate the effect of time- varying treatments. The main assumption 
underlying these models is that all information the doctors used to make 
treatment decisions, and which is predictive of the patient's prognosis with 
respect to the final outcome, is available for analysis. This assumption of "no 
unmeasured confounding" makes it possible to distinguish between treat- 
ment effect and selection bias. What data have to be collected to satisfy this 
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assumption of no unmeasured confounding is for subject matter experts to 
decide. All of the past treatment- and covariate information which both 
(i) influences a doctor's treatment decisions and (ii) is relevant for a pa- 
tient's prognosis with respect to the outcome of interest, has to be recorded. 
In Section 5 of Robins (1998b) and in Section 8.1 of Robins, Rotnitzky 
and Scharfstein (2000), a sensitivity analysis methodology for estimation of 
structural nested models is developed that does not assume no unmeasured 
confounders. Beyond treatment and covariates, the data requirements also 
include the measure of an outcome of interest; for example, survival time, 
time to clinical AIDS or CD4 count after the treatment period. 

Lok et al. (2004) study structural nested models in discrete time. These 
models assume that changes in the values of the covariates and treatment 
decisions take place at finitely many deterministic times, which are the same 
for all patients and known in advance. Lok et al. (2004) also assume that 
covariates and treatment take values in a discrete space. They indicate why 
it is reasonable to expect consistency and asymptotic normality in discrete 
time, and they refer to Lok (2001) for the proofs. Gill and Robins (2001) 
generalize Lok et al. (2004) to covariates and treatment taking values in M*^. 

In this article we consider structural nested models in continuous time, 
proposed in Robins (1992, 1998b). Structural nested models in continuous 
time allow for both changes in the values of the covariates and treatment 
decisions to take place at arbitrary times for different patients. As noted in 
Robins (1998a), structural nested models in continuous time assume that 
a short duration of treatment has only a small effect on the distribution of 
the outcome of interest. The effect of the treatment on an individual patient 
may be large, but then the probability of such effect has to be small for any 
particular short duration of treatment (see page 7, bottom). 

This article provides a conceptual framework and mathematical formal- 
ization of these practical methods, solving important outstanding problems 
and contributing to the causality discussion, especially for the time ordered 
and continuous time case. In particular, this article proves the conjectures 
in Robins (1998b) that structural nested models in continuous time lead to 
estimators which are both consistent and asymptotically normal. The proof 
simplifies considerably for structural nested models in discrete time (see our 
Discussion, Section 12). This article also proves that a test related to the 
score test can be used to investigate whether treatment affects the outcome 
of interest without specifying a model for the treatment effect. 

2. Setting and notation. The setting to which structural nested models 
in continuous time apply is as follows. The outcome of interest, from now on 
called Y, is a continuous real variable. For example, the survival time of a 
patient, time to clinical AIDS, or CD4 count after the treatment period. We 
wish to estimate the effect of treatment on the outcome Y. There is some 
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fixed time interval [0,r], with r a finite time, during whicli treatment and 
patient characteristics are observed for each patient. We suppose that after 
time r treatment is stopped or switched to some kind of baseline treatment. 
In this article we assume that there is no censoring, and that the outcome Y 
is observed for every patient in the study. See, for example, Robins (1998b), 
Hernan et al. (2005) and Lok (2007) for ideas about dealing with censoring. 

We denote the probability space by {Q,,J^,P). The covariate process de- 
scribes the course of the patient characteristics, for example, the course of 
the blood pressure and the white blood cell count. We assume that a realiza- 
tion of this covariate process is a function from [0, r] to M"', and that such a 
sample path is continuous from the right with limits from the left (cadlag). 
The covariates which must be included are those which both (i) influence 
a doctor's treatment decisions and (ii) possibly predict a patient's progno- 
sis with respect to the outcome of interest. If such covariates would not be 
observed, the assumption of no unmeasured confounding, mentioned in the 
introduction, will not hold. 

For the moment consider one single patient. We write Z{t) for the covariate- 
and treatment values at time t. We assume that Z{t) takes values in M™, and 
that Z{t) : is measurable for each t G [0,t]. Moreover, we assume 

that Z, seen as a function on [0, r], is cadlag. We write Zt = {Z{s) :0 < s <t) 
for the covariate- and treatment history until time t, and Zt for the space 
of cadlag functions from [0,t] to in which Zt takes it values. Similarly, 
we write Z for the whole covariate- and treatment history of the patient on 
the interval [0,r], and Z for the space in which Z takes its values. In this 
article we choose the projection cj-algebra as the cj-algebra on Zt and Z; 
measur ability of Z{s) for each s <t is then equivalent with measurability of 
the random variable Zt- For technical reasons, we include in Z a counter of 
the number of jump times of the measured treatment- and covariate process. 
We suppose that observations on different patients are independent. 

3. Counter factual outcomes. Structural nested models are models for 
relations between so-called counter f actuals. Consider for a moment just one 
patient. In reality this patient received a certain treatment and had final 
outcome Y. If his or her actual treatment had been stopped at time t, the 
patient's final outcome would possibly have been different. The outcome he 
or she would have had in that case we can yw. Of course, 1"*-*^ is generally 
not observed, because the patient's actual treatment after t is usually differ- 
ent from no treatment; it is a counterfactual outcome. Instead of stopping 
treatment, one can also consider switching to some kind of baseline treat- 
ment, for example, standard treatment. Figure 1 illustrates the nature of 
counterfactual outcomes. We suppose that all counterfactual outcomes Y^^\ 
for t S [0,r] and for all patients, are random variables on the probability 
space {^l,J^,P). 
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Fig. 1. Observed and counterfactual outcomes. 



4. No unmeasured confounding. To formalize the assumption of no un- 
measured confounding, consider the history of a particular patient. De- 
cisions of the doctors at time t may be based, in part, on recorded in- 
formation on the state of the patient and treatment before t, that is, on 
Zt- = {Z{s) :0 < s <t), but not on other features predicting the outcome of 
the patient. In particular, given changes of treatment at time t should 
be independent of y^*-* , the outcome of the patient in case he or she would 
not have been treated after time t, given Zt-. 

Note that y(*) is an indication of the prognosis of the patient which 
does not depend on treatment decisions at or after time t, since it is the 
counterfactual outcome which we would have observed if treatment would 
have been stopped at time t. Only if treatment would have no effect, the 
observed outcome Y could play this role. This is why Robins' assumption 
of no unmeasured confounding demands the independence, given Zt-, of 
treatment decisions at time t and Y^^\ Similar conditions, though with- 
out time-dependence, can be found in, for example, Rosenbaum and Rubin 
(1983). 

The statement "changes of treatment at t should be independent of Y^^^ , 
the outcome of the patient in case he or she would not have been treated 
after time t, given Zt-^ is not a formal statement: it includes conditioning 
of null events (since the probability that treatment changes at t may be 
for every fixed t) on null events (Zt-). 
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To overcome this difficulty, we assume that the treatment process can be 
represented by or generates a (possibly multivariate) counting process N. 
For instance, N(t) registers the number of treatment changes until time t 
and/or the number of times treatment reached a certain level until time t. 
A counting process constructed this way may serve as N in the following. 
More about counting processes can be found in, for example, Andersen et al. 
(1993). We assume that the treatment process N has an intensity process. 
Formally, such an intensity process X{t) is a predictable process such that 
N{t) — /q A(s)ds is a martingale. The intensity X{t) with respect to <j{Zt) 
can be interpreted as the rate at which the counting process N jumps given 
the past treatment- and covariate history Zt-. 

Assumption 4.1 (Bounded intensity process). has an intensity pro- 
cess X{t) on [0,r] with respect to the filtration a{Zt). This intensity process 
satisfies the following regularity conditions: 

(a) A is bounded by a constant which does not depend on a;, 

(b) \{t) is continuous from the left. 

According to this assumption, 

(1) M{t) = N{t) - f\{s)ds 

Jo 

is a martingale on [0,r] with respect to the filtration cr{Zt). Since most 
counting process martingale theory deals with filtrations J-'t which satisfy 
the usual conditions {J^q contains all null sets and J^t = Clsyt^s), we men- 
tion that, under Assumption 4.1, M{t) is also a martingale with respect to 
a{Zt)°', the usual augmentation of a{Zt)- This follows from Lemma 67.10 in 
Rogers and Williams (1994), since M is cadlag. 

Often, N will be chosen to count the number of events of a certain 
type concerning the treatment process (e.g., the number of times treatment 
changed). At r, the time the study ends, treatment is stopped or switched 
to baseline treatment, so a natural choice of N will often jump at r with 
positive probability. However, jumps of at r are not useful for estimation, 
and we wish to avoid modeling jumps of A^ at r. Therefore, we assume that, 
with probability 1, A^ does not jump at r, and if a natural choice of A^ does 
jump at r with positive probability, then we just adapt it, only at r, so that 
it does not jump there. 

We also make the following assumption. 

Assumption 4.2 (y('^ cadlag). y^'^ is a cadlag process. 

Within this framework, the assumption of no unmeasured confounding 
could be operationalized as follows. The rate at which the counting process 
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N jumps given past treatment- and covariate history is also the rate at which 
jumps given past treatment- and covariate history and Y^^'^ = (y^**) :s <t). 
That is, the following: 

Assumption 4.3 (No unmeasured confounding — formalization). The in- 
tensity process A(t) of N with respect to cr(Zt) is also an intensity process 

of N with respect to a{Zt,Y^^^). 

This can be interpreted as conditional independence (given Zt-) of treat- 
ment decisions at time t and {Y^^^ ■.s<t). This assumption is stronger than 
just conditional independence of treatment decisions at t and Y^^^ as as- 
sumed in Robins (1992, 1998b). However, also Y^^^ for s < t is an indication 
of the patient's prognosis upon which treatment decisions at time t {> s) 
should not depend. Assumption 4.3 allows us to use the usual counting 
processes framework. Under Assumption 4.3, M{t) = N{t) — /jq^] X{s)ds is 

a martingale also with respect to a{Zt,Y^^^) and its usual augmentation 

cT(zt,Y^'^r. 

This formalization of the assumption of no unmeasured confounding in 
terms of compensators with respect to the filtration a{Zt,Y^^^) is a novel 
feature of this paper relative to the previous literature on structural nested 
models. Robins et al. (1992), Robins (1998b) and Keiding (1999) use a Cox 
model for initiation and/or changes in treatment. However, none of them 
formalized the assumption of no unmeasured confounders in terms of com- 
pensators with respect to a{Zt,Y^^^). As a consequence, they could not use 
the extensive theory on counting process martingales to show the asymp- 
totics of their estimators, which then remained without proof. 

5. The model for treatment effect. Structural nested models in contin- 
uous time model distributional relations between y and y , for /i > 
small, through a so-called infinitesimal shift-function D. Write F for the 
cumulative distribution function and : (0, 1) ^ M for its generalized in- 
verse 

F"^(p) = inf{x : F{x) > p}. 
Then the infinitesimal shift-function D is defined as 



(2) Diy,t;Zt) = ^ 



the (right-hand) derivative of the quantile-quantile transform which moves 
quantiles of the distribution of y^*^ to quantiles of the distribution of y(*+'^) 
(/i > 0), given the covariate- and treatment history until time t, Zt- Notice 
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that for differentiability of -^y(t+h)|;^ ^^^^ respect to h we need that a short 
duration of treatment has only a small effect on the distribution of the 
outcome of interest (see page 3, second paragraph), since lim/j|o -^y(t+h)|2^j 
must be equal to Fy(t)|-^^. 

Example 5.1 {Survival of AIDS patients). Robins, Blevins, Ritter and 
Wulfsohn (1992) describe an AIDS clinical trial to study the effect of AZT 
treatment on survival in HIV-infected subjects. Embedded within this trial 
was an essentially uncontrolled observational study of the effect of prophy- 
laxis therapy for PCP on survival. PCP, Pneumocystis carinii pneumonia, 
is an opportunistic infection that afflicts AIDS patients. The aim of Robins 
et al. (1992) was to study the effect of this prophylaxis therapy on survival. 
Thus, the outcome of interest Y is the survival time, and the treatment 
under study is prophylaxis for PCP. Once treatment with prophylaxis for 
PCP started, it was never stopped. 

Suppose that 

(3) D{y,t;Zt) = (1 - e''')! {treated at i}- 

Then (see Section 6 for details), for t <Y, withholding treatment from t 
onward leads to (with ~ meaning "is distributed as") 

— t / gV'l{trcatcd at s) fj^g 

= e^ ■DUR{t,Y) + !■{¥- t-DUR{t,Y)) given Zt, 

with DUR{t, u) the duration of treatment in the interval (t, u). Thus, treated 
residual survival time [t until Y) is multiplied by by withholding treat- 
ment; compare this with accelerated failure time models, see, for example. 
Cox and Oakes (1984). This multiplication factor should be interpreted 
in a distributional way. One of the models studied in Robins et al. (1992) 
assumes that (4) is true even with ~ replaced by = (though only for t = 0). 
Notice that supposing (4) to be true with ~ replaced by = would be much 
stronger. 

Example 5.2 (Survival of AIDS patients). Consider the situation from 
Example 5.1 again. In another model mentioned in Robins, Blevins, Ritter 
and Wulfsohn (1992) the factor with which treated residual survival time is 
multiplied when treatment is withheld can depend on the AZT treatment the 
patient received and whether or not the patient had a history of PCP prior to 
start of PCP prophylaxis. Since this was a clinical trial for AZT treatment, 
the AZT treatment was described by a single variable /azt indicating the 
treatment arm the patient was randomized to (/azt is or 1). Whether or 
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not the patient had a PCP history prior to start of prophylaxis is described 
by an indicator variable P{t). P{t) equals 1 if the patient had PCP before 
or at t and before prophylaxis treatment started; otherwise P{t) equals 0. 
If 

^^i,i'2,i'a\yi '^t) — ^ J J-{treated at i}; 

then (see Section 6 for details) withholding prophylaxis treatment from t 
onward leads to 

i-Y _ 

(5) Y^^^-tr~.j el{troatcdats}(V'l+V'2P(s)+»/'3/AZT)^g giVCuZi, 

for t < Y. 

Example 5.3 (Incorporating a-priori biological knowledge). Following 
Robins (1998b), again consider survival as the outcome of interest. Suppose 
that it is known that treatment received at time t only affects survival 
for patients destined to die by time t + 5 if they would receive no further 
treatment. An example would be a setting in which failure is death from an 
infectious disease, the treatment is a preventive antibiotic treatment which 
is of no benefit unless the subject is already infected and, if death occurs, it 
always does within five weeks from the time of initial unrecorded subclinical 
infection. In that remarked in Robins (1998b), the natural restriction 

on D is that 

D{y,t;Zt) = iiy-t>5. 

More biostatistical examples of models for D can be found in, for exam- 
ple, Mark and Robins (1993), Witteman et al. (1998), Robins (1998b) and 
Keiding et_al. (1999). 

D{y,t;Zt) can be interpreted as the infinitesimal effect on the outcome 
Y of the treatment actually given in the time-interval [t, t + h) (relative to 
baseline treatment). To be more precise, from the definition of D we have 

h-D{y, t-Zt) = {Fyl^,,^-^^ o Fy(,)|^J(y) -y + o{h). 

In Figure 2 this is sketched, y in the picture is the 0.83th quantile of the dis- 
tribution of y^*) given Zt- For /i > 0, the 0.83th quantile of the distribution 
of y(*+^) given Zt \s y + h- D{y,t; Zt) + o{h). Thus, to shift from quantiles 
of the distribution of Y^*^ to the distribution of given Zt {h > 0) is 

approximately the same as to just add h ■ D{y,t; Zt) to those quantiles. For 
example, if Fy(t+h)|^^ lies to the right of -^y(t)|"2( for h> 0, then treatment 

between t and t + h increases the outcome (in distribution), and D(-,t;Zt) 
is greater than 0. 
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Fig. 2. Illustration of the infinitesimal shift-function D. 



Consider again this interpretation of D as the infinitesimal effect of treat- 
ment given in [t, t + h). If the outcome of interest is survival, then D{y, t; Zt) 
should be zero if Zt indicates the patient is dead at time t. Indeed, in 
that case -^y (t+h) i^^^ and FyC^^lZt should be almost surely the same for every 
h>0, since withholding treatment after death does not change the survival 
time. Thus, F"^^^^^^^ o FY(t)\2^{y) is constant in /i for /i > and, therefore, 

D{y,t;Zt) = 0. However, this reasoning is not precise because of the com- 
plication of null sets. We will therefore just formally define D{y,t; Zt) to be 
zero if the outcome of interest is survival and Zt indicates the patient is 
dead at time t. 

It can be shown that D = if treatment does not affect the outcome 
of interest, as was conjectured in Robins (1998b). To be more precise, Lok 
(2001) shows that, for example, D = if and only if, for every h > and 
t, y(*+^) has the same distribution as Y^^^ given Zt- That is, D = if and 
only if "at any time t, whatever patient characteristics are selected at that 
time (Zt), stopping 'treatment as given' at some fixed time after t would 
not change the distribution of the outcome in patients with these patient 
characteristics." 

In the rest of this article will always indicate a correctly specified 
parametric model for D, with D = if '0 = 0. 

6. Mimicking counterfactual outcomes. Define X{t) as the continuous 
solution to the differential equation 



(6) 



X'it) = D{X{t),t;Zt) 
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with final condition X{t) = Y , the observed outcome (see Figure 3). Then 
X{t) mimics 1"*-*^ in the sense that it has the same distribution as Y^^^ 
given Zt- This rather surprising result was conjectured in Robins (1998b) 
and proved in Lok (2001, 2004). To prove this result, we need the following 
consistency assumption. 

Assumption 6.1 (Consistency), y^^^ has the same distribution as Y 
given Zr- 

Notice that since by assumption no treatment was given after time r 
and since treatment is right-continuous, there is no difference in treatment 
between Y^'^^ and Y. We suppose that this assumption holds, and we also 
suppose that a short duration of treatment has only a small effect on the 
distribution of the outcome of interest (lim/^io -^y(t+h)|;^t(y) ~^ 
Under these assumptions and regularity conditions only, Lok (2001, 2004) 
proved that indeed equation (6) has a unique solution for every uj and 
that this solution X{t) mimics in the sense that X{t) has the same 
distribution as Y^^^ given Zt (see Appendix B). Throughout this article we 
will assume that this result holds true. 
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Example 6.2. Survival of AIDS patients (continuation of Example 5.1). 
Suppose that 

D{y,t;Zt) = (1 -e'^)l{ti.catcd at t}- 

Then 

/•Y 

X{t) = i + J e'^^("-°="<='i ^* "> ds 

if y > t, and X{t) = y for t > y. 

Suppose now that one has a correctly specified parametric model for the 
infinitesimal shift-function D, D^. Then one can calculate the 
solution to 

(7) X'^{t) = D^{X^{t),t;Zt) 

with final condition X^{t) = Y. For the true ■i/'j X^(t) has the same dis- 
tribution as y(*\ the outcome with treatment stopped at t, even given all 
patient-information at time t, Zf. So instead of the unobservable y^^^'s, we 
have the observable X^{t)^s which for the true ip mimic theyW's. Although 
we do not know the true ip, this result turns out to be very useful, both for 
estimating Tp (Sections 8 and 10) and for testing [Section 11; notice that 
when testing whether treatment affects the outcome (i.e., whether D = 0), 
X can simply be calculated from the data {X = Y) under the null hypothesis 
of no treatment effect]. 

7. Local rank preservation. Previous applications of structural nested 
models [see, e.g., Robins et al. (1992), Mark and Robins (1993), Witteman 
et al. (1998) and Keiding et al. (1999)] have assumed the so-called local 
rank preservation condition. Local rank preservation states that y^*) is a 
local solution to (6). However, if y*^*) is locally a solution to (6), it is usu- 
ally also globally a solution to (6); see, for example. Theorem A.l in the 
Appendix. Hence, if one knew the parameter ip, every y*^*) would be a de- 
terministic function of the observed data. Deterministic dependence of coun- 
terfactuals on the observed data is a very strong condition, which, though 
untestable, is generally considered implausible. The previous literature [see, 
e.g., Robins (1998b), Robins et al. (1992), Mark and Robins (1993) and 
Keiding et al. (1999)] acknowledged this problem, and conjectured that the 
assumption of local rank preservation could be relaxed in continuous time 
[since it is known that the assumption of local rank preservation can be 
relaxed for structural nested models in discrete time; this was pointed out 
by Robins and Wasserman (1997), and Lok et al. (2004) provided a proof]. 
See Robins (1998b) for a more elaborate discussion. The following exam- 
ple describes a setting where the assumption of local rank preservation is 
implausible. 
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Example 7.1 (Survival of AIDS patients and local rank preservation). 
In the situation of Example 5.1, consider the following thought experiment. 
Suppose that two patients had the same covariate history until time t, and 
both received the same constant treatment to prevent PCP until time t 
(equal Zt). Suppose, furthermore, that both patients received no treatment 
after time t, that they did not have PCP before time t and both died at 
the same time u > t (for both, y^*) = u). Possibly, the first patient would 
have had PCP at some time s < t and would have died from it before u 
in case he or she would not have been treated. Possibly, the other patient 
would not have had PCP in case he or she would not have been treated, 
and would have died at the same time n > t as without treatment. Thus, 
it is easy to imagine that these patients would have had different outcomes 
under no treatment (different Y^^^). However, the assumption of local rank 
preservation excludes this possibility. 

Local rank preservation is a very strong condition, for which structural 
nested models have previously been attacked. In fact, this article shows that 
the assumption of local rank preservation is not needed for structural nested 
models. However, proofs would be much easier under rank preservation; for 
details, see the remarks before the proofs of Theorems 8.5 and 9.2. See also 
Robins (1998b) for a more informal reasoning. 

8. Estimation of treatment effect. To estimate the infinitesimal shift- 
function D, Robins (1998b) proposes to use a (semi-)parametric model to 
predict future treatment (N in our case) on the basis of past treatment- and 
covariate history Zt-. This may seem odd, since prediction of treatment is 
not what we are interested in. However, we will show that such a model to 
predict treatment changes can indeed be a tool to get unbiased estimating 
equations for the parameter ip in the model for D. Moreover, often doctors 
may have a better understanding, at least qualitatively, about how decisions 
about treatment were made than about the effect of the treatment. In what 
follows we will assume that Xg is a correctly specified parametric model for 
the intensity A of A^. 

Recall from Section 4 that, under no unmeasured confounding (Assump- 
tion 4.3), y^*^ does not contain information about treatment changes given 
past treatment- and covariate history Since X{t) has the same distri- 
bution as y(*^ given Zt, one could expect that also X{t) does not contain 
information about treatment changes given Zt-. Unfortunately, this rea- 
soning is not precise: we have to somehow deal with null sets since the 
probability that treatment changes at t given past covariate- and treatment 
history is often equal to for each t. In Section 9 we will show how this can 
be dealt with. 
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In the current section we present a class of unbiased estimating equations 
for 9 and These will be used for the proof in the next section, but they are 
also of interest in their own. In Section 9 we will see that these estimating 
equations are in fact martingales, for the true parameters 9q and ipQ. 

Recall from Section 4 that, under no unmeasured confounding, we have 
the martingale M[t) = N{i) — /jq A(s) ds with respect to the filtration 

a{Zf,Y^^^) and its usual augmentation. From this martingale we can con- 
struct a whole family of martingales. If ht{Y^^~\ Zt-) is a a{Zt,Y^^^)°'- 
predictable process, then, under regularity conditions, 

I hdM= / hs{Y^'-\Zs^)dM{s) 
Jo Jo 

is a martingale with respect to cr{Zt,Y^^^)^ . For a more formal statement, 
we first make sure that ht{Y^^~\ Zt-) is predictable. We put the following 
restriction on the functions ht we consider here: 

Restriction 8.1. When in this section we consider functions ht from 
M X Zf^ , we assume that they are measurable and satisfy the following: 

(a) ht is bounded by a constant which does not depend on t and Z, 

(b) for all to e [0,t], yoER and ueQ, ht{y, Zt-{uj)) htoiyo, Zto-{u})) 
when y ^ yo and 1 1 to- 

For such ht, ht(Y^^~\Zt-^) is a o-(Zt,F^*V-predictable process: 

Lemma 8.2. Suppose that Y^'^ is cadlag {Assumption 4.2). Then ht{Y^^~\ 
Zt-) is a a {Zt^Y^^^Y -predictable process for any ht satisfying Restriction 8.1. 

Proof. ht{Y^^^\ Zt-) is adapted. It is also left-continuous: because Y^'^ 
is cadlag, liuit^to l"^*"-* = y(*o~) exists. □ 

Thus, we come to the following lemma: 

Lemma 8.3. Under Assumptions 4.1 (bounded intensity process), 4.2 
(y(') cadlag) and 4-3 (no unmeasured confounding), 

f' hs{Y^'-\Zs-){dN{s) - X{s)ds) 
Jo 

is a martingale on [0,r] with respect to a{Zt,Y^^^)°' for all ht satisfying 
Restriction 8.1. 



CAUSAL EFFECTS IN CONTINUOUS TIME 



15 



Proof. M{t) = N{t) — J^X{s)ds is a martingale on [0,r] with respect 

to a{Zt,Y^^^)"' , because of Assumption 4.3. It is of integrable variation 
[EJq \dM{s)\ < EJ^ dN{s) + A(s) ds = 2E A(s) ds, and A is bounded (As- 
sumption 4.1)]. Because of Lemma 8.2, h{t) = ht{Y^^-\'Zt-) is a c^(Zi,F^*'')"'- 
predictable process. It is also bounded [Restriction 8.1(a)]. Hence, 
/o h{s) dM{s) = /o hs{Y^'-\'Zs-){dN{s) - \{s) ds) is an integral of a bounded 
predictable process with respect to a martingale of integrable variation, and, 

therefore, a a (Zt,y^*'')''-martingale. □ 

To construct unbiased estimating equations for (6*0, V'o)i we need to assume 
that the probability that N{-) and Y^'^ jump at the same time is zero. 
This assumption is a formalization of the assumption of no instantaneous 
treatment effect as proposed in Robins (1998b), which can be seen as follows. 

Given 'Zt- and Y^*~\ N jumps at t with rate \{t) (Assumption 4.3, no 
unmeasured confounding). Y^'^ is a cadlag process (Assumption 4.2), which 
thus for every u; G jumps at most countably many times on the finite 
time interval [0,r]. Therefore, if Y^'^ and N would jump at the same time 
with positive probability, this would imply a dependence of these jumps; 
the obvious interpretation of this dependence would be that a change of 
treatment instantaneously affects the outcome of interest. 

Assumption 8.4 [No instantaneous treatment effect). The probability 
that there exists a t such that N{-) and Y^'^ both jump at time t is 0. 

Notice that this excludes estimation of the effect of point exposures. For 
example, if treatment is surgery or another point exposure given at some 
time t, the outcome under "treatment stopped at time t" will typically jump 
at time t if treatment affects the outcome of interest, at the same time as the 
treatment itself. However, this assumption does not exclude the possibility 
that the outcome differs depending on whether a patient is treated or not at 
a certain point in time. For example, y(*+) and may be different when 

a virus is contacted at time t. The model in this article can accommodate 
differences between and Y^^~\ as long as the probability that the 

observed treatment changes is at that precise time. Or, in more generality, 
as long as the probability that N jumps at the same time is 0. This was 
previously noticed in Robins (1998a), Section 8. The estimating procedures 
in this article do not deal with instantaneous treatment effects. 

Suppose that the above conditions hold and that (X(t),Zf) ~ {Y^^\Zt) 
for t S [0,r] (see Section 6). Then if and \e are correctly specified (para- 
metric) models for D and A, respectively, each choice of ht satisfying Restric- 
tion 8.1 leads to an unbiased estimating equation for both the parameter of 
interest tp and the (nuisance) parameter 9: 
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Theorem 8.5. Suppose that Assumptions 4.1 {bounded intensity pro- 
cess), 4.2 (Y^'^ cadlag), 4.3 (no unmeasured confounding) and 8.4 (no in- 
stantaneous treatment effect) are satisfied. Suppose also that, for every t £ 
[0,t], X(t) has the same distribution as Y^^^ given Zt- Then 

E r ht{X{t),Zt-){dN{t) - X{t)dt) = 
Jo 

for each ht satisfying Restriction 8.1. Thus, if and Xg are correctly spec- 
ified parametric models for D and X, respectively, 

Pn r ht{X^{t),Zt^){dN{t) - Xe{t)dt) = 0, 
Jo 

with Pn the empirical measure PnX = l/n^'^^iXi, is an unbiased estimat- 
ing equation for (^OiV^o); for each ht satisfying Restriction 8.1. ht here is 
allowed to depend on ij) and 6, as long as it satisfies Restriction 8.1 for 
(^cV'o). 

As before, X-^{t) here is the continuous solution of (7), X'^{t) = D^{X^{t),t; 
Zt) with boundary condition X^{t) =Y. Moreover, as before, we assume 
that for all we have existence and uniqueness of such solutions; Theo- 
rem A.l in the appendix provides sufficient conditions for that. 

Under local rank preservation (see Section 7), X{t) = y(*) for each t. In 
that case Theorem 8.5 follows immediately from Lemma 8.3. However, as 
argued in Section 7, local rank preservation is generally considered implau- 
sible. 

Proof of Theorem 8.5. We have to show that 
ht{X{t),Zt^)idN{t) - Xit)dt) 



has expectation zero for all ht satisfying Restriction 8.1. To do that, we 
prove that it has the same expectation as 

ht{Y^'-\Zt-){dN{t)-X{t)dt), 







which has expectation zero because of Lemma 8.3. We will first show that 
the terms with dN have the same expectation, that is, 

(8) e( J2 ht{X{t),Zt^=E( J2 htiY'^'-KZt^)). 

\t<T,AN{t)=l ) \t<T,AN{t)=l I 

After that we show that the terms with A(t) dt have the same expectation, 
that is, 

(9) E(^J\t{X{t),Zt^)X{t)dt^ =E(^J\t{y^'-\Zt^)X(.t)dt 
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As we will see below, (8) and (9) have to be proved separately, since we do 
not have or expect that {X(s),Zt) ~ {Y^^\Zt) for s <t; we only have this 
for s >t. Therefore, the approximations below have to be chosen carefully. 

At first we prove (8), by approximating these sums and showing that the 
approximations have the same expectation. Next we show that the approx- 
imations converge and that (8) follows with Lebesgue's dominated conver- 
gence theorem. 

Define Ti = mf{t : N{t) = 1}, T2 = inf{t : N{t) = 2}, etc., the jump times 
of the counting process N in the interval [0,r]. They are measurable [e.g., 
because of Rogers and Williams (1994), Lemma 74.4]. Note that the number 
of jumps in [0,r] is almost surely finite because N is integrable (it has a 
bounded intensity process). In the following read h'T {Y^'^^~\ Zt -) = if 
there is no jth jump of in the interval [0,r]. 

Next split up the interval [0, r] in intervals of equal length: for G N fixed, 
put Tk = kr/K, k = 0, . . . , K . Fix K for the moment. The right-hand side of 
equation (8) is harder to approximate than the left-hand side, both because 
Y^^^ does not need to be continuous in t while X{t) does and because knowing 
and Zt does not imply knowing Y^^^ for s <t and we do not have or 
expect {X{s),Zt) ~ (Y^^\ Zt) for s <t. The approximations we choose are 

00 

Y: h{Yi^~\Zt^)=Y^hT,{Y^^^~\ZT,^) 

AN{t)=l,t<T 3=1 

(10) 

00 K-1 

^Y.T.\r,,r,,.mhr,{Y^^>'^^\-Zr,-) 
j=l k=0 

and 

00 

5^ ht{Xit),Zt-) = Y.hT^{XiT,),ZT,-) 

AN{t)=l,t<T j=l 

(11) 

00 K-1 

l(rfe,rfe+i](7;-)^r,(X(rfc+l),Z^,-). 

j=l k=0 

To show that these approximations have the same expectation, we use 
that (A(rfc+i),Z^,^J~(V(^fc+i),Z^,^J. Therefore, also 

Mr,,r,^,]iT3)hr,mrk + l),Zr,_) ~ l^r,,r,^,]{T,)K,{Y(^^-^^\Z^^_) 

[notice that l(^^^^^,^j (Tj) is a function of Zt^j^^. Hence, the expectation of 
each of the terms on the right-hand side of (10) is equal to the expecta- 
tion of the corresponding term on the right-hand side of (11). Since /if is 
bounded [Restriction 8.1(a)] and the expected number of jump times Tj is 
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finite {N is integrable), this implies that the expectation of the right hand- 
side of equation (10) is equal to the expectation of the right-hand side of 
equation (11). 

Equation (8) follows if the expectation of the approximations in (10) and 
(11) converges to the right-hand side and left-hand side of equation (8), 
respectively. This convergence is harder to show for (10) than for (11), since 
may jump [while X{-) does not, by construction]. Fix j for a moment. 
Define and tI_^^ as the grid points such that Tj £ (r^, t^_,_]^]. As ^ oo, 

^k+i i -^i' ^^^^ since Y^'^ is cadlag, y^'^fc+i) — > yC^i). Moreover, as i^T — > 

oo, I Tj, so that because of Restriction 8.1 on h, h j (Y^'^^+i) ^ z j_) ^ 

'^k 

hT iY^'^^\ZT -)- Combining this for all j leads to 



oo K-l 



E E i{..,r...](?;-)/^..(>^(^-^\^..J-EH(^^'^^'U^.-) 

j=l k=0 j=l 

as — > oo for every to for which the number of jumps of N is finite, so for 
almost every lo ht is bounded [Restriction 8.1(a)] and the left-hand side 
is bounded by the number of jumps of times this bound. The expecta- 
tion of that is finite because is integrable. Thus, Lebesgue's dominated 
convergence theorem can be applied, and 

/ oo K-i \ / oo \ 

^ E E i(..,..+.](^.)/^..(>^(^'=-\^..-) -i? EH(^^'^^^^^.-) 

\j=i k=o I \i=i / 

as K ^ oo. Because with probability one > and A^ do not jump at the 
same time (Assumption 8.4 of no instantaneous treatment effect), 

oo oo 

Y^hT^{Y<^^\ZT,-) = Y.hT^{Y'^^^-\ZT,-) a.s., 
j=i i=i 

so that we can replace Y^i^ by on the right-hand side of (12). There- 

fore, indeed, the expectation of the approximation in (10) converges to the 
expectation of the left-hand side of (10). The same reasoning shows this 
for (11). Here less caution is necessary since A'(t) is continuous in t. That 
concludes the proof of equation (8). 

Next we prove (9), also by approximation. Here, too, we show that the ap- 
proximations have the same expectation and that (9) follows with Lebesgue's 
dominated convergence theorem. 

Divide the interval [0,r] as above. The approximations we choose here 
are 

_ _ 

(13) / ht{Y^'~\Zt^)\{t)dt^ J2 hr,iY^^'\Zr,-)XiTk)iTk+l-Tk) 



CAUSAL EFFECTS IN CONTINUOUS TIME 



19 



and 

_ K~i _ 

(14) / ht{X{t),Zt-)mdt^ K,{X{rk),Z^^_)\{Tk){Tk+i - Tk). 

Because {X (rk) , Z rf.) ~ {Y^'^''\ Zr,.) and A(rfc) are a measurable function of 
Zr^. (Assumption 4.1, bounded intensity process), the expectation of each of 
the terms in (13) is equal to the expectation of the corresponding term in 
(14). Thus, the expectations of these approximations are equal. 

Equation (9) follows if the expectation of the approximations in (13) 
and (14) converge to the right-hand side and left-hand side of equation (9), 
respectively. This convergence is also harder to show for (13) than for (14) 
because of possible discontinuities of Y^'\ First notice that as K ^ oo, for 
t fixed, 

K-l 

l(.,,.,^,](t)/i,,(y(-'=),Z,,_)A(r,) ^/ii(y(*-),Zi_)A(t) 

fc=0 

for every u; E fixed and for every t < t: Y^'^ has limits from the left 
(Assumption 4.2), so that as 1 1, Restriction 8.1(b) on h can be used, 
and A is continuous from the left [Assumption 4.1(b)]. Taking integrals and 
applying Lebesgue's dominated convergence theorem [ht and A are bounded 
because of Restriction 8.1(a) and Assumption 4.1(a), resp.] leads to 

for every u; € As both h and A are bounded, Lebesgue's dominated conver- 
gence theorem guarantees that indeed the expectation of the approximation 
in (13) converges to the expectation of the left-hand side of (13). The same 
reasoning shows this for (14), which concludes the proof of equation (9) and 
Theorem 8.5. □ 



Lok (2001) shows that if the rest of the conditions in this section are 
satisfied. Assumption 8.4 (treatment does not instantaneously affect the 
outcome of interest) is a necessary condition for Theorem 8.5. 

Example 8.6 (Survival of AIDS patients and the Weibull proportional 
hazards model). Consider the setting of Examples 5.1 and 5.2 and define 
N(t) = 1 if prophylaxis treatment started at or before time t and otherwise. 
Suppose that initiation of prophylaxis treatment can be correctly modeled 
with the time-dependent Weibull proportional hazards model 

h,lAt) = l{at risk at ,|e7t^-le^^^AZT+e2/pCP(t)^ 
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where /pcp(i) equals 1 if the patient had PCP before time t and otherwise, 
and ^ and 7 are greater than zero [for more about the Weibull proportional 
hazards model and its applications see, e.g., Collett (1994)]. If the patient 
died before t or prophylaxis treatment already started before, the patient 
is not "at risk" for initiation of treatment and, thus, A equals 0. Then the 
(partial) score equations for estimation of {C,j,0) are 

Pnj^i^ ^ + logt /azt /pcp(i)) {dN{t)-\i.^^^e{t)dt) = {). 
Such estimating equations can also be written down for the model including 

haAe^At) = l{at risk at ^^t^'^ e'^'^-'^+'^'^^^^'^+'^''^^'\ 

Robins (1998b) proposes to estimate the parameters in a model like this by 
choosing those parameters (^,7,0,'i/') which maximize the likelihood when 

is considered fixed and known, and for which 6i{ip) = 0: for the true 
X^{t) = X{t) ~ y(*) does not contribute to the model for treatment changes 
(under no unmeasured confounding). To make the connection with the esti- 
mators in the current article, notice that this leads to the same estimators 
as the ones that solve the estimating equations arising from the likelihood 
when X^ is considered fixed and known, with a put to zero. More precise, 
since we know that the true a is equal to 0, we put a equal to and get the 
estimating equations 

^'Jo{\ ^ + logt /azt /pcp(i) X^{t)^ {dN{t)-X^^^^e{t)dt) = 

for the parameter -0 (and thus also for D) and the (nuisance) parameters 
(^,7,^). These estimating equations are of the form of Theorem 8.5, 

Pn j\t{X^,{t),Zt^){dN{t) - A5,^,e(t)dt) = 0, 

but the function ht here is not bounded and A need not be bounded (if 7 < 1), 
so unbiasedness does not follow immediately from Theorem 8.5. However, 
we could restrict the interval [0, r] to [e, r] for e > (to assure that A is 
bounded) and \ogt can be approximated by the bounded functions logt V C 
(C — > —00) (to make ht bounded), which all lead to unbiased estimating 
equations because of Theorem 8.5. The above estimating equations are then 
also unbiased because of Lebesgue's dominated convergence theorem [the 
dominating function is integrable since 

El \logt\{dN{t) + \^^e{t)dt) = 2E \logt\Xc ^ e{t) dt 
Jo ' ' Jo 

<2^7el''il+l^2| r|logt|tT-i 
Jo 

which is finite since 7 > 0] . 



dt. 
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Under the model for D of Example 5.1, 



D^{y,t;Zt) = (1 - e'^)l{treated at t}, 




if the patient did not die before time t. In that case these are five unbiased 
estimating equations for five unknown parameters. If the parameter ip is 
of dimension greater than 1, more unbiased estimating equations can be 
constructed by adding more terms of the form af{X^{t),Zt-). 

9. X{t) does not predict treatment changes: a martingale result. We 

show that, under no unmeasured confounding, just as Y^^\ ^(t) does not 
predict treatment changes, given past treatment- and covariate history Zt- ■ 
We could hope for that since x{t) ~ yw given Zt (see Section 6). The 
formal statement is (compare with Assumption 4.3, no unmeasured con- 
founding) the following: the intensity process A(t) of N with respect to 
(j{Zt) is also the intensity process of N with respect to a{Zt,X(t))'^. Then 
M{t) = N{t) — /q A(s)ds is also a martingale with respect to a{Zt,X{t))'^ . 
That will be useful later when we study the behavior of estimators 9 and 
ip which are constructed with estimating equations of the form of Theo- 
rem 8.5, Pn Jq ht{X^{t), Zt-){dN{t) — Xg{t)dt) = 0. For example, we can 
use the fact that usually J^q H {s) dM (s) is a martingale if M is a mar- 
tingale and H a predictable process; a sufficient condition for this is that 
E J \ H{s)\\dM{s)\ < oo [see, e.g., Andersen et al. (1993)]. Hence, all estimat- 
ing equations of the above form which we saw before are in fact martingales 



Before going on, we first clarify why a{Zt,X{t)) is indeed a filtration. 
For s <t, X{s) is a deterministic (though unknown) function of {Zt-,X{t)) 
(i.e., if solutions to the differential equation with D are unique; see, e.g.. 
Theorem A.l in the Appendix). Similarly, for s < t, X{t) is a determinis- 
tic function of {Zt-,X{s)). In the rest of this article we will assume that 
these functions are measurable functions on Zt- x M (sufficient conditions 
for that are that the infinitesimal shift-function D satisfies regularity As- 
sumption 9.1 below and that for each u; £ 0,, Z only jumps finitely many 
times; see Appendix C, Lemma C.l). Thus, 



We will use the filtration a{Zt,X(t)) below, keeping in mind that it is indeed 
a filtration and satisfies equation (15). 

In the rest of this section we assume that the infinitesimal shift-function 
D satisfies the following regularity condition: 



for {e,^P) = {00,^0). 



(15) 



a{Zt, X{t)) = a{Zt, {X{s) :s<t))= a{Zt,X{0)). 
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Assumption 9.1 (Regularity of the infinitesimal shift-function D). 

(a) {Continuity between the jump times of Z). If Z does not jump in 
{ti,t2)-, then D{y,t;Zt) is continuous in {y,t) on [ti,t2) and can be contin- 
uously extended to [ti,t2]- 

(b) (Boundedness) . For each uj £ ft, there exists a constant C{u}) such 
that \D{y,t;Zt)\ < C{uj) for aU t G [0,r] and aU y. 

(c) {Lipschitz continuity). For each to £ ^l, there exist constants Li(ij) 
and L2{uj) with 

|2?(y, t; Zt) - < LiH|y - z\ 

for all t G [0, t] and all y, z and 

\D{y,t;Zt) - Diy,s;Z,)\ < LsMI* - s\ 
if s < t and Z does not jump in {s,t]. 

Most regularity conditions on D here are satisfied for the D's from Ap- 
pendix B [see also Lok (2001, 2004)]. Only the second Lipschitz condition 
is extra. The Lipschitz conditions are satisfied, for example, if, in between 
the jump times of Z, D is continuously differentiable with respect to y and 
t with derivatives which are bounded for every fixed u; G fi. 

The next theorem states that M is indeed also a martingale with respect 
to a(Zt,X{t)r: 

Theorem 9.2. Suppose that the conditions of Theorem 8.5 hold: As- 
sumptions 4.1 (bounded intensity process), 4.2 (Y^'^ cadlag), 4.3 (no un- 
measured confounding), 8.4 (no instantaneous treatment effect) and for ev- 
ery t G [0,r], X{t) has the same distribution as Y^^^ given Zt- Suppose, 
furthermore, that for each uj gQ, Z jumps at most finitely many times, and 
that D satisfies regularity Condition 9.1. Then the intensity process \{t) of 
N with respect to cr{Zt) is also the intensity process of N with respect to the 
filtration a(Zt, Xit))" . 

Recall that in Section 6 we already mentioned that, under regularity con- 
ditions, X{t) mimics y(*) in the sense that it has the same distribution as 
y(*) given Zf 

Under local rank preservation (see Section 7), X{t) = for each t. In 
that case Theorem 9.2 would be the same as the Assumption of no unmea- 
sured confounding 4.3. However, as argued in Section 7, local rank preser- 
vation is generally considered implausible. 

Proof of Theorem 9.2. Because of Assumption 4.1, A(t) = /q A(,s) ds 
is predictable with respect to cr(Zt), so then it is also predictable with respect 
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to the larger filtration a{Zt,X{t))"' . We still need to prove that M is a mar- 
tingale with respect to a{Zt,X{t))°'. Since a cadlag martingale with respect 
to some filtration is also a martingale with respect to its usual augmentation 
[see Rogers and Williams (1994), Lemma 67.10], it suffices to prove that M 
is a martingale with respect to a{Zt, X{t)). Thus we need to prove that, for 
t2>h, 

E[M{t2)-M{h)\Zt„X{h)]=0. 
This is not immediate, since we do not have or expect that {X(ti),Zt2) ~ 

if ti<t2. 

By the definition of conditional expectation, the above is the same as 
(16) / {M{t2)-M{ti))dP = 

JB 

for all B E cr(Z(^ , X(ti)). Because of Theorem 34.1 in Billingsley (1986), it 
is sufficient to consider S's forming a vr-system generating (T{Zt^,X{ti)). 
With o"! the u-algebra on Zt^^ , 

{oj £ U-.'Zt^ e A and X{ti) G {xi,X2):A£ ai and xi < X2 G M} 

is such a vr-system: it is closed under the formation of finite intersections 
and generates a{Ztj^,X(ti)). Therefore, we only consider B's of this form. 

We prove (16) for any B = (Zt, e A} D {X{ti) G (xi.xs)}. Let ig^^^^ be 
any approximation of 1(^1, ^2) which is continuous for every fixed n, with 

-^("!,X2)(^) ~^ ^{xi,x2)i^) foi' every x as n ^ cx) and |l("j^^2)| < 1 for aU x and 
n. Then 

/ {M{t2)-M{ti))dP 

JB 

= E{lB-iM{t2)-Mih))) 

= i?(u(^tjl(.,,.2)(^(ii)) / dMit)) 

\ •J(ti,t2] / 

= eJ l(t„i,](t) lA(Zt,)li,„,2)iX{ti))dM{t) 

= eJ l(,^,,](t)U(ZiJJim lg^^^)(X(ti))dM(t) 

= SJim I l(,^,,,](t) U(ZiJlg^^^)(X(ti))(diV(t)-A(t)dt) 

= / l(t.,*.]WlA(^iJlg,.,)(X(ti))dM(t). 

The last two equalities follow from Lebesgue's dominated convergence theo- 
rem [the prior to last equality since, for u; G fixed, the integral is bounded 
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since N is finite and A is bounded; the last equality since the integrals are 
all bounded by N{t) + Jq X{t) dt, whose expectation is bounded by 2r times 
the upper bound of A]. Equation (16) and the result of the theorem would 
follow from Theorem 8.5 if 

for some h[^^ from M x Zt- M satisfying Restriction 8.1 for each fixed n. 
In principle, this seems possible, since ^(^i) is a function of X{t) and Zt- 
for every t> ti. 

Indeed, under the conditions above on D and Z, it is possible to find 
such an h[^\ as follows. Write x{-;tQ,XQ) for the solution of the differential 
equation 

x'{t)=D{x{t),t;Zt) 

with (final or initial, depending on t) condition x{to) = xq. Existence and 
uniqueness of x{-;to,xo) on [0,r] for every fixed lo follows from Theo- 
rem A.l in Appendix A. In this notation, 

i(*i,t.]Wu(^t,)i[:;,.,)(x(ti)) 

= l(,,,i,](i) U(ZiJlg^^^)(rE(ti;t,X(t)))=/i(")(X(t),Zi_) 

with 

(17) /i(")(y,Zt_) = l(i,,,](t)U(ZiJlg_^^)(x(ti;t,y)). 

We have to show that (17) satisfies Restriction 8.1. First we show that, for 

t fixed, /ij"^ : R x Zt- — > R is measurable. From (17) we see that this is the 
case if x{ti;t, •) : R x Zt- ^ R is measurable, which follows immediately from 

Lemma C.l. Restriction 8.1(a) is immediate, since h^^^ is bounded by 1. For 
Restriction 8.1(b), we have to prove that, for all lo £0,, h["\y, Zt-{io)) — > 
ht^\yo, Ztg-iuj)) when y ^ yo and t] tQ. Fix lij G 0. We consider three dif- 
ferent kinds of Iq. If to < ti and to, ht"'\) = = ht^\-), so that the con- 
vergence follows immediately. If to > t2 and t^ to, eventually h^^\-) = = 

/i|"^(-), so that the convergence also follows immediately. If to £ (*i)*2]) con- 
vergence of the first two factors is immediate. For the last factor, we need 

(n) 

differential equation theory. 1^^^ is continuous. Thus, to prove that the 
last factor in equation (17) converges, it suffices to show that x{ti;t,y) — > 
x{ti]to,yo) ast^to and y yo- 

Fix CO For t close enough to to, we compare the solution of the differ- 
ential equation with final condition y at t with the solution of the differential 
equation with final condition yo at to ; we look at the value of the solution at 
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the time point ti before both t and to- First, notice that because of existence 
and uniqueness of solutions (Theorem A.l), the solution of the differential 
equation with final condition y at t takes a unique value y = x{tQ; t, y) at to- 
Since x is differentiable with respect to its first argument with derivative D 
and D is bounded by C{lo) [Assumption 9.1(b)], y is not far from y if t is 
not far from tQ-. 

(18) \y-y\ = |x(to; t, y)-y\< C{uj)\t - tol - 

Next, notice that, again because of existence and uniqueness of solutions, the 
value at ti of the solution of the differential equation with final condition y 
at t is the same as the value at ti of the solution of the differential equation 
with final condition y = x{tQ;t,y) at to- This observation implies that 

\x{ti;t,y) - x{ti;to,yo)\ = \x{ti;to,y) - x{ti;to,yo)\ 

<e^iH|ti-to||y_y|^| 

(19) 

<^L,i.)\t,-to\^C{co)\t-to\ + \y-yo\). 

For the first inequality, we use Corollary A. 3 and Assumption 9.1 [notice 
that possible jumps of D at the jump times of Z do not matter here since 
one can split up the interval, so if, e.g., there is just one jump at t G (ti,to], 
one gets a factor 

^Li{uj)\ti-t\ _ ^Li{uj)\t-to\ _ g-I'i(a;)|ti-to| 

etc. (a formal proof can be given with induction since, with w G still fixed, 
there are only finitely many jumps of Z)]. For the last inequality, we use 
equation (18). If y ^ yo and 1 1 to, the bound in equation (19) converges to 
for every fixed lo £Q. Thus, indeed, if y ^ yo and 1 1 to, x{ti;t, y) converges 
to rc(ti; to, yo)- This finishes the proof. □ 

10. Consistency and asymptotic normality. The estimating equations for 
{6,ip) from Section 8 were all of the form Pngg^^{Y, Z) = 0. In the current 
section we choose the dimension of g the same as the dimension of {9,ip). 
Estimating equations of this form are well known. Theorem 10.2 below is an 
example of asymptotic theory in the setting of this article, with conditions 
in terms of h and the intensity process A. Notice, however, that these condi- 
tions are in fact stronger than necessary. For more theory about these types 
of estimating equations and less restrictive conditions, see Van der Vaart 
(1998), Chapter 5. In particular, conditions could be weakened by consid- 
ering the estimating equations as a whole instead of looking at h and A 
separately (see, e.g.. Example 10.5). 

We only consider smooth h^'^: 
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Restriction 10.1. The functions h^'"^ -.R x Zt ^ are measurable 
and: 

(a) Every component of h^°'^° satisfies Restriction 8.1. 

(b) h^'^{y, Zt~) is bounded by a constant Ci not depending on 9, ip, t, y 
and (J e O. 

(c) For each t G [0, r] and w G 0, {6,ip,y) h^'^ {y, Zf^) is continuous. 

(d) There exists a neighborhood of (^QjV'o) such that, for each t G [0, r] 
and 6<j G ri, hf'^ {y, Zt_) is continuously differentiable with respect to 6, ijj 
and y, and these derivatives are all bounded by a constant C2 not depending 
on 6*, 2/ and a; G 0. 

(e) Every component of g^|;^|(e,^)=(eo,^o)/it''^(y, satisfies Restric- 
tion 8.1. 

Theorem 10.2 (Consistency and asymptotic normality). Suppose that 
Assumptions 4.1 (bounded intensity process), 4.2 [Y^'^ cadlag], 4.3 (no un- 
measured confounding) and 8.4 [no instantaneous treatment effect) are sat- 
isfied. Suppose also that, for every t£ [0,r], X{t) has the same distribution 
as y^*^ given Zt- From Theorem 8.5 we know that, for h satisfying Restric- 
tion 10.1(a), (^OjV'o) is a zero of 

E r /if ^ (X^ (t) , Zt^ ) (diV(t) - \e (t) dt) . 
Jo 

Suppose now that {6o,ipo) is the only zero. Suppose, furthermore, that we 
know that {Oq, i/jq) G (O, ^) with (0, ^) compact, that 9 — > Xg{t) is continuous 
for each t and bounded by a constant C3 which does not depend on {uj,t,9), 
and that ip — > X^{t) is continuous for each t. Then any sequence of (almost) 
zeros {9,ip) of 

^n{0, ^) = Pn r h','^{X^{t),Zt-){dN{t) - Xe{t) dt), 

Jo 

that is, any sequence of estimators {6,ip) such that ^n{9,ip) converges in 
probability to zero, is a consistent estimator for (^QjV'o) for each ht satisfying 
Restriction 10.1(a)-(c). 

Suppose, moreover, that 9 X0{t) is differentiable with the respect to 9 
with derivative bounded by a constant C4 in a neighborhood of 9q, and ip 
X^{t) is differentiable with respect to ^ with the derivative bounded by a con- 
stant C5 in a neighborhood ofipQ. Then for each h satisfying Restriction 10.1 
there is a neighborhood of {9o,ipo) such that E h^'^{X^{t), Zt-){dN{t) — 
\g{t)dt) is continuously differentiable with respect to {9,ip). Suppose, more- 
over, that the matrix 



Vo = E\ 





f 




{e,ip)={eo,iPo)Jo 



h't''^{X4t),Zt-){dN{t)-Xe{t)dt) 
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is nonsingular. Then there exists a sequence of (almost) zeros {0,'iIj) to 
Furthermore, any such sequence is asymptotically normal: 

(20) v^iie i^f-iOo iJoV)--^f{o,v^-'Woiv,-y) 

with Vq the matrix above and, with a®^ = aa~^ , 

Wo = E[(^f^ hl^ (t ) , ) (diV(t ) - {t) ) 

Proof. Consistency follows from Theorem 5.9 of Van der Vaart (1998). 
Repeatedly applying Lebesgue's dominated convergence theorem shows that 
our conditions imply the conditions of the first and second paragraph after 
this Theorem 5.9. 

The existence of (almost) zeros follows from Van der Vaart and Wellner 
(1996), Section 3.9, Problem 9, whose solution is practically given by the 
hint below it. This Problem 9 states that if / : G x ^ — > M*^ is a homeomor- 
phism of a neighborhood of {9q,iPq) G M"^ onto a neighborhood of G M'^, 
then every continuous / : x ^ ^ M'' for which sup(g ,^)gQxvi< 11/(6*, ^) — 
g{6,il^)\\ is sufficiently small has at least one zero. In our case giOjijj) = 
E ht''^{X^{t),'Zt-){dN{t)-Xg{t)) is continuously differentiable m a neigh- 
borhood of (9(),ipo) by Restriction 10.1(d) and the assumptions on Xq and 
X^(t), under which differentiation and integration can be exchanged (twice). 
The derivative of this giyO^il)) at (^O)'0o) is nonsingular by assumption, and 
hence, 51 is a homeomorphism of a neighborhood of {Oojtpo) G M*^ onto a 
neighborhood of G M'^. ^'„(0,^/^) is continuous in (0,V') and close enough 
to g{6,ijj) for large n with probability approaching 1 because of the second 
paragraph below Theorem 5.9 in Van der Vaart (1998). Hence, ^'„,(0,^) has 
a zero with probability approaching 1. 

Asymptotic normality follows from Theorem 5.21 of Van der Vaart (1998), 
as follows. Define ge,i,{Y,Zt) = ht'^ {X^{t),'Zt-){dN{t) - Xe{t) dt), which 
is continuously differentiable with respect to (Ojip) in a neighborhood U of 
(^0) V'o) under our conditions. Making U smaller so that all boundedness con- 
ditions hold on U, we define g{Y,Zt) =sup(g_^)gf; \\-gj^ge,^{Y, Zt)\\, which 
is bounded by (C2 + C2C5 + C2)[N{t) + Csr) -|- CiC4r, a constant plus a 
constant times N{t). This g{Y,Zt) is square integrable since N{t) is square 
integrable: it is well known that counting processes with bounded intensity 
processes are square integrable [it follows from, e.g.. Proposition II. 4.1 of 
Andersen et al. (1993)]. For the same reason, E\\g0^^{Y,Zt)\\'^ < 00. The re- 
maining conditions of Theorem 5.21 from Van der Vaart (1998) were checked 
before, so, indeed, (^,V') is asymptotically normal with asymptotic covari- 
ance matrix (20). □ 
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The asymptotic variance (20) is often estimated by replacing (^QjV'o) by 
their estimates and E by P„. Thus, confidence intervals for i/jq can be con- 
structed. Also, tests for whether ipo has a specific value can be constructed 
that way. For more about testing, see Section 11. 

One can often simplify the expression for the asymptotic variance in equa- 
tion (20) using Corollary 10.4 below. We use the following lemma: 

Lemma 10.3. Suppose that the conditions of Theorem 9.2 hold: Assump- 
tions 4.1 {bounded intensity process), 4.2 [V^'^ cadlag], 4.3 (no unmeasured 
confounding), 8.4 (no instantaneous treatment effect), for every tG [0,r], 
X{t) has the same distribution as Y^^^ given Zt (see Section 6), for each 
Lo gQ, Z jumps at most finitely many times, and D satisfies regularity Con- 
dition 9.1. Write a®b = ab~^ and a^^ = aa'^ . Then for /it : R x Zf ^ M"' 
every component of which satisfies Restriction 8.1, 

^ ( (i" ^* " ' ^ "^^^ ^1 " ^ ^* " ''^^ ' ^ ^ ■ 

If, furthermore, \g is a correctly specified model for A such that -§gXg exists 
and is a correctly specified model for D such that, for each t, X^(t) is 
differentiable with respect to tp at ip = 'ipQ, then, for h^'^ satisfying Restric- 
tion 10.1, 



=00 JO 



h't^^\X{t),Zt-){dN{t)-Xe{t)dt) 
d 



/o 

if the left- or right-hand side exists and 
d 



\e{t)dt 

do 



dip 



ht''^{X^{t),Zt-)dM{t 
d 



E[Hxit),Z,.)^- 



X^{t) dM{t) 

d 1, 6*0,^0/ 



if the left- or right-hand side exists, where h{y,Zt-) = Q-h/' {y, Z 



Proof. For the first statement, we use counting process theory from 
Andersen et al. (1993), Chapter 2. If Mi is a martingale, (Mi) (if it exists) is 
defined as a predictable process such that Mi — (Mi) is a (local) martingale. 
If M2 is another martingale, (Mi, M2) (if it exists) is defined as a predictable 
process such that M1M2 — (M2, M2) is a (local) martingale. (Mi) is called the 
predictable variation process of Mi and (Mi,M2) is called the predictable 
covariation process of Mi and Af2. For vector- valued Mi, (Mi) is defined as 
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a predictable process such that Mf 2 - (Mi) is a (local) martingale. Hence, 
it is a matrix with {Mn^Mij) at the ith. row, jth column. 

As shown in Theorem 9.2, M{t) = N{t) - /g \{s) ds is a martingale with 
respect to the filtration a{Zt,X{t))°'. Counting process martingales like 
this have compensators: {M(t)) = Jl^X{s)ds [see, e.g., Proposition II. 4.1 
in Andersen et al. (1993)]. Moreover, if Hi and H2 are (locally) bounded 
(T(Zt, X(t))''-predictable processes with values in M, then {Jq Hi{s) dM{s), 
Jq H2{s) dM {s)) exists and 

t Hi{s)dM{s), r H2{s)dM{s))= f Hi{s)H2{s)X{s)ds 
Jo JO I JO 

[Proposition II. 4.1 or (2.4.9) in Andersen et al. (1993)]. Because ht satisfies 
Restriction 8.1, ht{X{t), Zt~) is a bounded cr(Zt, X(t)) "-predictable process 
(proof just as in Lemma 8.2). Therefore, the theory above leads to 



E 



h'^'^^\X{t),Zt-)dM{t) 



E 
E 



ht''^^X{t),Zt-)dM{t)'^^ 



hf'^'^%X{t),Zt.)'^''X{t)dt]. 



For the second statement, notice that, under the conditions of the lemma. 



d_ 
80 



--do \J0 



h'/"iX{t),Zt_){dN{t)-Xe{t)dt) 



/i,'°''^«(X(t),Zi„)® 



d_ 

09 



=00 



Xg{t)dt 



h'i^' ]iX{t),Zt^){dN{t) - X{t)dt), 



=00 



and the expectation of the second term here is equal to zero because of 
Theorem 8.5. 

For the third statement, notice that, under the conditions of the lemma, 

a 



dip 



i'=ipo ^JO 



h't''"^ {X^ {t),Zt^){dN{t)- Xe (t) dt) 



d 



ht{X{t),Zt-)® — 



X^{t)]{dN{t)-X{t)dt) 



d_ 



h' 



00,'/' 



tp=l(>0 

{X{t),Zt^){dN{t)-X{t)dt) 



because of the chain rule, and the expectation of the second term is equal 
to zero because of Theorem 8.5. □ 
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This lemma simplifies the asymptotic variance formula of the estimators 
in equation (20): 

Corollary 10.4 (Asymptotic variance). Suppose that the conditions 
of Theorem 9.2 hold: Assumptions 4.1 (bounded intensity process), 4.2 [Y^'^ 
cadlag], 4.3 {no unmeasured confounding), 8.4 (no instantaneous treatment 
effect), for every t G [0,r], X{t) has the same distribution as 1"*-*^ given Zt 
(see Section 6 ), for each lo ^Vt, Z jumps at most finitely many times, and D 
satisfies regularity Condition 9.1. Suppose also that X$ is a correctly specified 
model for A such that -§g\e exists and that is a correctly specified model 
for D such that, for each t, X^{t) is differentiable with respect to ip at 
tjj = ipQ. Then if 



Xg{t) ] dt 

=00 



ge,4Y,Z) = / h"t'^{X^{t),Zt^){dN{t) - Xe{t)dt) 
Jo 

and h^'^ satisfies Restriction 10.1, the asymptotic variance (20) is equal to 
V^^WoVq^^ with 

Wo = E(^j\t'"'^'°{X{t),Zt-f^X{t)dt 
and Vq = (Vo6iVo^,) with 

V^e = -E(^[[ht^^^%X{t),-Zt-)®^^ 
and, with h{y,Zt^) = ^/ij''''^°(y, Zj_), 

Vo^ = e(^£ (h{X{t),Zt-) ^ ^ X^{t)ydN{t)-X{t)dt)y 

We conclude this section with an example to see the machinery work 
in practice. Notice that the boundedness conditions of Theorem 10.2 are 
somewhat too restrictive for this example, but that the results hold true 
under these weaker restrictions, too. 

Lemma 10.5 (Survival of AIDS patients and the Weibull proportional 
hazards model). Consider the setting of Example 8.6, and suppose that 
the assumptions of Section 9 are satisfied. In Example 8.6, 
h,-,,e{t) = l{at risk at ,}e7i^-ie^i^AZT+e2/PCP(*)^ 

where at risk means at risk for initiation of prophylaxis treatment. X^{t) is 
the solution to the differential equation X'^{t) = D^{X^{t),t; Zt) with final 
condition X^{t) = Y and D^{y,t; Zt) = (1 - e'^)l{treatedatf}; so 

X^{t)=t + J e'^^('^'=='"=d at .} ^g. 
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In Example 8.6 we already saw that (Co) 70) ^0) "^o) is a zero of 
^{lo ^ + logt Iazt Ipcp{t) X^{t)^ {dN{t)-X^^^^g{t)dt)y 

Suppose now that (Co) 70) ^Oj V'o) ^-5 the only zero. Suppose, furthermore, that 
the survival time Y takes values in a compact space [0, yo] C M and that we 
know that (^o,7o,^o,V'o) GSxTxGx^, with E C (0,oo), T C (0,cx)), 6 
and ^ all four compact (note that this implies that ^ and 7 are bounded 
away from 0). Then any sequence of (almost) zeros {^,j,9,ip) of 

(21) =P^J^(^^ ^ + log* ^AZT /pcp(t) X^{t)y 

X {dN{t) - X^^.y^e{t)dt), 

that is, any sequence of estimators (,^,7,0, -0) such that ^n{£,,'J,0,'fp) con- 
verges in probability to zero, is a consistent estimator for (^Oi7O)^O)'0o)- 
Moreover, Vq = (VogVo^) as in Corollary 10. 4 exists, and 

Voe = -E r — + logt /azt Ipcp{t) X{t) ) 

Jo V ?o 70 / 

11 \ 
— — + logt /azt Ipcp{t) ] X{t)dt 
?o 70 / 

and Vq^ is a five- dimensional vector with zeros in the first four positions 
and 

^ X^{t)]{dN{t) - X{t)dt) 

t(>=tpo / 



xdip 



E 

in the fifth, with 

d 

Xip{t) = e " Ijtreatcd at s} 



Y 



d_ 

dtp 

If this Vq is a nonsingular matrix, then there exists a sequence of (almost) 
zeros {S,,^,9,ip) of (21). Furthermore, any such sequence is asymptotically 
normal: 

7 ^)^-(eo 70 ^0 Vor)-A^(0,yo"'W^oCK)"'r), 
with Vq the matrix above and 

Wq = E - — + logt /azt Ipcp{t) X{t)] \{t)dt. 
Jo V ?o 70 / 

Moreover, 6 and ip are asymptotically independent. 
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The asymptotic independence here turns out to be no coincidence; see 
Lok (2001, 2007). 



Proof of Lemma 10.5. The findings here are similar to the findings 
in Theorem 10.2, but the boundedness conditions fail to hold here. Con- 
sistency follows from Van der Vaart (1998), Theorem 5.9. Existence of a 
sequence of (almost) zeros follows from Van der Vaart and Wellner (1996), 
Section 3.9, Problem 9, whose solution is practically given by the hint below 
it. Asymptotic normality follows from Van der Vaart (1998), Theorem 5.21. 
The asymptotic variance equals V(^^Wo{V(^^)~^ because of Corollary 10.4 (A 
and h are not bounded here, but we can restrict the interval to [£,t] for 
e > and let e j 0). We leave checking the conditions of these theorems to 
the reader [or see Lok (2001), Section 7.7]. 

Asymptotic independence of 9 and ip follows by direct calculation, after 
noticing that 



-B 
-C 







A. 







with 



A, 



d_ 

d'ip 



B = E 



1 

To 



1 

70 



X^{t)](dN{t)-X{t)dt), 



\{t) dt 



and 



Define 



C = E X{t) 



^ — + logt Iazt /pcp(t) ) A(t) dt. 
?o 70 



D = E X{tf\{t)dt. 



This concludes the proof: 
^o"'W^o(^o~')^ = 








A, 



-1 

V'o 

B-^ 



B 
C 

C^A 



D 

'IT \ 



A, 



V'o 



A^.iCB-^C'^ + D){A-/^)'^ 



□ 
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11. Test for treatment effect without specifying a model for D. We show 
that one can often test whether treatment affects the outcome of interest 
without specifying a model for D. This was conjectured, but not proved, 
in Robins (1998b). If one does not have to specify a model for D in order 
to test whether treatment affects the outcome, false conclusions caused by 
misspecification of the model for D can be avoided. 

Under the null hypothesis of the no treatment effect, D = and X{t) = 
Y (see Section 5). If there is no unmeasured confounding, X{t) does not 
predict N{t) given Zf_ (see Theorem 9.2). Hence, if there is no unmeasured 
confounding and no treatment effect, adding the observed outcome Y to the 
prediction model for treatment effect should not help the prediction. This 
idea, presented in Robins (1998b) for the case of local rank preservation (see 
Section 7), can be proven to be correct as follows. 

Technically, the tests in this section are similar to the score test [for more 
about the score test see, e.g., Cox and Hinkley (1974)]. Suppose that the 
conditions of Section 9 are satisfied, and that we have a correctly specified 
parametric model Xg for A. Define 

ge{Y,Z) = r hUY,Zt^){dN{t) - Ae(t)dt), 
Jo 

with satisfying the regularity condition Restriction 8.1. The key idea 
of this procedure is that if treatment does not affect the outcome, D = 0, 
so X{t) = Y, and go^(Y,Z) has expectation zero because of Theorem 8.5. 
Since is unknown, we base the test on the limiting behavior under D = 
of y/nPng§ (Y, Z) , where 6 is an estimator of the nuisance parameter • 
We will show that if D = 0, ^/nPngg(Y, Z) converges to a normal random 
variable with expectation zero, which leads to a test for whether D = in 
the usual way. 

The nuisance parameter will be estimated using some set of estimating 
equations 

PngeiZ) = 

with EgQ^^{Z) =0, Egg{Z) differentiable in 6 and Egl^{Z) < od. A natural 
choice would be a maximum (partial) likelihood estimator for ^o- We sup- 
pose throughout this section that the resulting estimator 9 is consistent and 
asymptotically normal with 

^ Ege(Z)) V^PngeAZ) + op{l), 



(22) ^/^{9 - 00 



as will usually follow from, for example. Van der Vaart (1998), Theorem 5.21. 
If Xq and /ij are sufficiently smooth, 

e ^ V^Pnge{Y,Z) 
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is differentiable with respect to 9, and a Taylor expansion around 6*0 leads 
to 

s/7iPnge{Y,Z) = ^Pnge,{Y,Z) + - ^o), 

with ijg the derivative of qq with respect to 6 and between and 6. Since 
6 converges in probability to so does 9. Therefore, usually, Pngg{Y, Z) 
Egg^^iY^Z). Sufficient conditions under which this holds are given in Ap- 
pendix D, Lemma D.l. Because of (22) and the central limit theorem, 
^/n{9 — 9q) converges in distribution. Therefore, an application of Slutzky's 
lemma leads to 

V^Pr,gg{Y,Z) = V^Pngeo iy,Z) + Ege, {Y,Z)^{9 - 9^) + op(l) 

= V^PngeM'Z) - Ege,(Y,Z)Vo~^^Pngeo(Z) + op(1) 

with Vq = -^\g=gi^Egg{Z). If D = 0, X{t) = Y , so that Theorem 8.5 implies 
that also the expectation of go^^ is equal to zero. Therefore, the central limit 
theorem can be applied on the vector with ^/n on the right-hand side; it 
converges to a normal random variable with expectation zero. Because of the 
Continuous Mapping Theorem [see, e.g.. Van der Vaart (1998), Chapter 18], 
y/nPng^{Y, Z) then converges to a normal random variable with expectation 
zero, too. 

Calculation of its limiting covariance matrix is standard [see, e.g.. 
Van der Vaart (1998), Chapter 18]. To save space, we omit that calcula- 
tion here. If desirable, one can use Theorem 8.5 and Lemma 10.3 to simplify 
the expression. 

Notice that a test for whether D = Dq for any specific Dq can be con- 
structed in exactly the same way. If we have a correctly specified model 
for D, this thus also leads to a confidence region for ipo in the usual way, 
using the duality between testing and confidence regions: include those ip 
for which the null hypothesis D = is not rejected. 

12. Discussion and extensions. The proof of consistency and asymptotic 
normality of the estimators presented in this article applies to continuous- 
time structural nested models. A similar proof applies to structural nested 
models in discrete time (when covariates are only measured at finitely many 
fixed times = tq < ti < • • ■ < tk < tk+i = t, which are the same for all 
patients and known in advance). Lok, Gill, van der Vaart and Robins (2004) 
argue without proof that consistency and asymptotic normality should hold 
for discrete time structural nested models under reasonable assumptions; 
the proof is completed with the current article, as follows. It is easy to see 
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that in discrete time, J2Tk<t ^i^-^ i'^k) = M^t^-) is the compensator of N 
with respect to cr{Zt) [see, e.g., Lok (2001), Section 7.4]. The assumption of 
no unmeasured confounding can be formahzed as 



The discrete-time counterparts of Theorem 8.5 and Theorem 9.2 follow im- 
mediately [see Lok (2001)]. Consistency and asymptotic normality follow in 
the same way as for continuous time models. 

The tests for treatment effect in this article can be carried out without 
specifying a model for treatment effect; that is, no model for D is needed. 
This is an important feature of the tests because it allows one to avoid 
false conclusions caused by misspecification of the model for D. In practice, 
it may be hard to specify a correct parametric model for the infinitesimal 
shift-function, D. Thus, it is good that specification of a model for D is not 
needed to test for treatment effect. 

The estimators in this article require the correct specification of a model 
for treatment effect and of a model for prediction of treatment changes. 
For the discrete-time setting, Robins (2000) has recently proposed estima- 
tors which are doubly robust. Doubly robust estimators are consistent and 
asymptotically normal if (i) the model for prediction of treatment changes 
(A in the current article) is correctly specified or if (ii) a regression model 
of a blipped down outcome [X-^pit) in the current article] on past treatment- 
and covariate history Zt- is correctly specified. In any case, the model for 
treatment effect (D^ in the current article) has to be well specified. 

In this article estimation started with the specification of a model for 
the infinitesimal shift-function, D. Interpretation of results may be eas- 
ier when one starts with a model like (4), Y^^^ — t ~ gV'i{trcatcd at s}(ig — 

■ DUR{t,Y) + 1-{Y -t- DUR{t,Y)), given Zt. Here, DUR{t,u) is the 
duration of treatment in the interval {t,u). The main results in the current 
article apply also to this model. The proofs for Theorems 8.5 and 10.2 do 
not depend on X being the solution to X'{t) = D{X{t),t; Zt). The proof of 
Theorem 9.2 does depend on X'[t) = D{X{t),t; Zt), but it simplifies con- 
siderably if (4) or (5) is used as a starting point. Let us show this for (4). 
Define X{t) = t + e^^^*''"'^'"^ ^> ds. Using the first part of the proof of 
Theorem 9.2, for ti < t, 




MZr,-). 



l^t,M(t)lA{Zt,)l 



(n) 

{X1,X2) 



iX{h)) 
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where DUR{ti,t) is the duration of treatment in the interval {ti,t). 
is measurable if the duration of treatment until t is in included in Z{t). 
Moreover, if 1 1 to and y —>^yo, then h[^\y,t) /i|"^(yo,to)- Hence, it follows 
immediately that /ij"^ satisfies Restriction 8.1, which concludes the proof. 
Depending on the specific application, it will be more appropriate to start 
with — t ~ g'/'i{troatcd at s} (]^g giyeu Zf or wlth a model for D; see, for 
example. Example 5.3. 

In the previous literature [see, e.g., Robins et al. (1992), Mark and Robins 
(1993), Witteman et al. (1998), Keiding et al. (1999) and Hernan et al. 
(2005)] applications have been carried out under the assumption of local rank 
preservation, where X{t) = Y^^^ for each t. As pointed out in these papers, 
and as discussed in Section 7, the assumption of local rank preservation 
is generally considered implausible. This article relaxes the assumption of 
local rank preservation in structural nested models. The estimators and 
tests applied in the previous literature are specific cases of the estimators 
and tests studied in this article, with the only difference that some of the 
estimators in the previous literature allow for censoring of Y. Aside from the 
issue of censoring, this article provides a mathematical foundation behind 
previous estimators, relaxes the specification of the counter factual outcomes 
as deterministic variables, and allows for a distributional interpretation of 
the estimators. 

Robins (1998b) conjectures that one can often use standard software to 
test whether treatment affects the outcome of interest (without specifying 
a model for D), and to estimate V'- Lok (2001, 2007) shows that both 
testing and estimation can also be considered from a partial likelihood point 
of view. As shown in Lok (2001, 2007), this approach leads to a subclass of 
the estimators and tests studied in this article which can indeed be calculated 
with standard software. Example 8.6 is a specific case of that. The possibility 
to use standard software may be a good reason to choose these estimators 
in practice. See Robins (1998b) and Lok (2001, 2007) for a more elaborate 
discussion. 

The approach adopted in the current article leads to a large class of 
estimators and tests. When treatment and covariates change at finitely many 
fixed times only, Robins (1993, 1997) proposes, without proof, an optimal 
procedure for survival and nonsurvival outcomes, respectively. The optimal 
choice of estimators or tests under the framework of this article is another 
intriguing topic for future research. 

The current article assumes a parametric model Xg for the prediction 
of treatment changes. In practice, applications have often used a semi- 
parametric Cox model for \g. Lok (2001) shows that specifying Xg using 
a semiparametric Cox model leads to unbiased estimating equations, which 
just as in this article are martingales for the true parameters. Consistency 
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and asymptotic normality of the resulting estimators for D still remain to 
be shown and constitute interesting topics for future research. 

In many applications, observations are censored. Robins (1998b) and 
Hernan et al. (2005) have proposed methods to deal with censoring that 
could potentially be adapted to the results in this paper. For D of the form 
of Example 5.1, Lok (2007) includes proofs with censoring due to the fact 
that the study ends, so-called administrative censoring, using ideas from 
these previous papers. 

The estimators in this paper depend on solving a differential equation for 
each observation. In the examples, these equations are simple enough to be 
solved analytically. If that is not possible, these equations should be solved 
numerically. It might be worth investigating how a small contamination 
of the solution to the differential equation X{t) affects the estimates of 
treatment effect. 

I conclude with a discussion of the assumptions used in this article. The 
most important assumption in this article is the assumption of no unmea- 
sured confounding (Assumption 4.3). As discussed before, this assumption is 
valid if all information has been recorded which both (i) predicts treatment 
decisions and (ii) is an independent risk factor for the outcome of interest. 
The validity of the assumption of no unmeasured confounding cannot be 
tested statistically, and depends on the quality of the recorded information. 
Therefore, it is for subject matter experts to decide about the plausibility 
of the assumption of no unmeasured confounding. Second, we only estimate 
the effect of treatment for which a short duration of treatment has only a 
small effect on the distribution of the outcome of interest. The effect of the 
treatment on an individual patient may be large, as long as the probability 
of such an effect is small for any small duration of treatment. Third, the 
assumption of no instantaneous treatment effect (Assumption 8.4) is also 
restrictive: it excludes the estimation of the effect of treatments that have 
instantaneous effects, such as surgery or other point exposures. The remain- 
ing assumptions in this article are mostly benign. The assumption that the 
covariate- and treatment process can be represented by a cadlag process 
is generally accepted for most medical situations [see, e.g., Andersen et al. 
(1993)]. The functions ht and can be chosen such that the regularity 
conditions on these functions are satisfied. Even if ht is not bounded, it can 
often be approximated by bounded functions, and results may follow by a 
simple application of Lebesgue's dominated convergence theorem [see, e.g.. 
Example 8.6]. The same is true for the boundedness condition (Assump- 
tion 4.1) of the intensity process A. Assumption 4.2 that the counterfactual 
process y^*^ is cadlag is impossible to verify, but it is a plausible and con- 
venient regularity condition. 
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APPENDIX A: SOME THEORY ABOUT DIFFERENTIAL 

EQUATIONS 

Theorem A.l. Suppose that a function D{y,t;Zt) satisfies the follow- 
ing: 

(a) ('continuity between the jump times of Z). If Z does not jump in 
(ti,t2), then D{y,t;Zt) is continuous in {y,t) on [ti,t2) and can be contin- 
uously extended to [ti,t2]- 

(b) ('Lipschitz continuity^. For each uj G Vt, there exists a constant L(ij) 
such that 

\D{y,t;Zt)-Diz,t;Zt)\<L{io)\y-z\ 

for all t G [0, r] and all y, z. 

Suppose, furthermore, that, for each u; G fi, there are no more than finitely 
many jump times of Z. Then, for each to G [0, t] and yo G M, there is a unique 
continuous solution x(t;to)2/o) to 

x'{t) = D{x(t),t-Zt) 

with boundary condition x(to) =2/0 f^^^^^ i^i^ solution is defined on the whole 
interval [0,r]. 

This theorem follows from well-known results about differential equations; 
see, for example, Duistermaat and Eckhaus (1995), Chapter 2. 

For the next theorem, we also refer to Duistermaat and Eckhaus (1995), 
Chapter 2. It is a consequence of Gronwall's lemma. 

Theorem A. 2. Suppose that I is an open or closed interval in M, f - I x 
M" ^ is continuous and C:I ^ [0,oo) is continuous, and suppose that 

(23) \\fix,y)-fix,z)\\<Cix)\\y-z\\ 

for all X £ I and y,z £ M". Then, for every xq G / and yo G M, there is a 
unique solution y{x) of y'{x) = f{x,y{x)) with y{xQ) =yQ, and this solution 
is defined for all x £ T If g:I x M" is continuous and z:I ^ is a 

solution of z'{x) = g{x,z{x)), then 

\\y{x) - z{x)\\ 

<ero^(«)'^«||y(xo)-z(xo)|| 

+ re-^«'''^'^*'ii/(e,^(6)-5(e,^(0)ii^^e 

for all x,xq £ I with xq <x. 
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In Duistermaat and Eckhaus (1995) the interval is always an open in- 
terval, but as is generally known, this can be overcome by extending both 
/ and g outside the closed interval / by taking the values at the bound- 
ary of /. This preserves the Lipschitz- and continuity conditions. Existence 
and uniqueness on all of finitely many intervals implies global existence and 
uniqueness. 

We have a differential equation with end condition at r, so we are inter- 
ested in x,xo with x < xq: 

Corollary A. 3. Suppose that the conditions of Theorem A. 2 are sat- 
isfied. Then, for every xq £ I and yQ E M", there is a unique solution y{x) 
of y'{x) = f{x,y{x)) with y(xo) = yo, and this solution is defined for all 
X G I. If g:I X M" — > is continuous and z:I^ is a solution of 
z'{x) = g{x,z{x)), then 

\\y{x) - z{x)\\ 

+ r Jl^i^^^^Wfis^zis))- g{s,z{s))\\ds 

J X 

for all x,xq with x <xo. 

Proof. Put y{t) = y{xo - t). Then 

y'{t) = -y'ixo -t) = -f{xo - t, y{xo -t)) = f{t, y{t)), 

where /(t, y) = — /(xq — t,y). Thus, y{t) = y{xo — i) is a solution of the differ- 
ential equation y'{t) = f(t,y{t)) with boundary condition y(0) = y{xQ) = yQ. 
Define also z{t) = z{xq — t). Applying Theorem A. 2 on y concludes the proof, 
as follows: 

\\y{x) - z{x)\\ = \\y{xo - {xq - x)) - z{xo - [xq - x))\\ 
= \\y{xo-x)-z{xo-x)\\ 
= \\y{t)-m\\ 

with t = Xq — x >0. Notice that, because of equation (23), 

||/(t,y) - f{t,z)\\ < C{XQ - t)\\y - z\\ =: C{t)\\y - z\l 
with C{t) = C{xq — t). Hence, Theorem A. 2 implies that 

\\y{x) - z{x)\\ < e/o ^(«)'^«||y(0) - z{0)\\ 

+ re^«'^^'^''ii/(e,^(e))-9(e,^(6)Ne 

Jo 
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= e/o ^(^o-«)'^«||y(xo - 0) - z{xo - 0)11 







For the first term, we do a change of variables; ^ from to t, put s = — ^; 
dS, = —ds. <^ <t; s from xq — to xq — t = xq — {xq — x) = x. We conclude 
that the first term is equal to 



e 



For the second term, similar changes of variables can be done, resulting in 
Corohary A.3. □ 

APPENDIX B: MIMICKING COUNTERFACTUAL OUTCOMES 

In this appendix we present conditions under which X{t) mimics 1"^*-* in 
the sense that it has the same distribution as Y^^^ given Zt- This result 
is used heavily in this article. For the proofs, which are lengthy and use 
discretization, we refer to Lok (2001, 2004). Section B.2 deals with survival 
outcomes, Section B.l with other outcomes. Survival outcomes require a 
different set of assumptions, as will become clear below. The conditions here 
are somewhat more restrictive than the ones in Lok (2001, 2004), but they 
are simpler. 



B.l. Mimicking counterfactual nonsurvival outcomes. This section con- 
tains a sufficient set of regularity conditions to have existence and unique- 
ness of a solution X{t) to (6), X'{t) = D{X(t),t; Zt) with final condition 
X{t) = Y, the observed outcome (see Figure 3). Furthermore, together with 
Assumption 6.1 (consistency), they imply that X{t) has the same distribu- 
tion as y^*-* given Zt- 

The regularity conditions below should be read as the following: there 
exist conditional distribution functions FY(t+h)\^^ such that all these as- 
sumptions are satisfied. They can be relaxed to /i in a neighborhood of 0, 
if this neighborhood does not depend on Z. We only consider /i > 0, so the 
derivative with respect to /i at /i = is always the right-hand derivative. 

Assumption B.l (Regularity condition). 
• {Support). 

(a) There exist finite numbers yi and 7/2 such that all FY(t+h)\Zt ^^^^ 
the same bounded support [2/1,2/2]- 

(b) All FY(t+h)\-z^{y) have a continuous nonzero density fY{t+h)\z^{y) 
y G [2/1,^2]- 
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(c) There exists an e > such that fyw^'z^iy) — ^ U ^ [2/1)2/2]) 

ujGQ and t G [0, r] . 

• (Smoothness). For every uj 

(a) {y,t,h) — > Fy(t+h)|;^j(y) is differentiable with respect to t, y and /i 
with continuous derivatives on [yi,y2] x [^1,^2) x M if Z does not jump in 
{ti,t2), with a continuous extension to [yi,y2] x [ti,t2] x [0,oo). 

(b) The derivatives of Fyit+h.) (y) with respect to y and h are bounded 
by constants Ci and C2, respectively. 

(^) ^^y(*)|Zt(j') ^'^d J^|/,=o-^y{t+h)|;zj(2/) have derivatives with respect 
to y which are bounded by constants Li and L2, respectively. 

The support conditions may be restrictive for certain applications. Nev- 
ertheless, most real-life situations can be approximated this way, since yi 
and y2 are unrestricted and e > is unrestricted, too. Although the support 
conditions may well be stronger than necessary, they simplify the analysis 
considerably and, for that reason, they are adopted here. The smoothness 
conditions allow for nonsmoothness where the covariate- and treatment pro- 
cess Z jumps. This is important, since if the covariate- and treatment process 
Z jumps, this can lead to a different prognosis for the patient and thus to 
nonsmoothness of the functions concerned. 

Theorem B.2 (Mimicking counterfactual outcomes). Suppose that reg- 
ularity Condition B.l is satisfied. Then D{y,t;Zt) exists. Furthermore, for 
every lo £ ^l, there exists exactly one continuous solution X{t) to X'{t) = 
D{X(t),t; Zt) with final condition X{t) = Y. If also Assumption 6.1 (con- 
sistency) is satisfied and there are no more than finitely many times t for 
which the probability that the covariate- and treatment process jumps at t is 
greater than 0, then this X{t) has the same distribution as Y^^^ given Zt for 
all t G [0, r] . 

For a proof we refer to Lok (2001, 2004). 

B.2. Mimicking counterfactual survival outcomes. This section contains 
a sufficient set of regularity conditions to have existence and uniqueness of 
a solution X{t) to equation (6), X'{t) = D{X{t),t; Zt) with final condition 
X{t) = Y, the observed outcome (see Figure 3). Furthermore, together with 
Assumption 6.1 (consistency) and Assumptions B.3 and B.4 below, they 
imply that X{t) has the same distribution as y^*^ given Zt. The conditions 
here are natural conditions if the outcome of interest y is a survival time. 

As compared to Section B.l, we make two extra assumptions. The first is 
a consistency assumption, stating that stopping treatment after death does 
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not change the survival time. The second assumption states that there is no 
instantaneous effect of treatment at the time the patient died [notice that 
the difference between Y^'^^ the outcome with treatment stopped at the 
survival time Y, and Y is in treatment at Y]. 

Assumption B . 3 (Consistency) . Y^*^ =Y on {u;:Y <t}U{u;:Y^^'> <t}. 

Assumption B.4 (No instantaneous effect of treatment at the time the 
patient died) . F (*) = y on { w : y = t} U {w : Y (*) = t} . 

Under these assumptions, treatment in the future does not cause or pre- 
vent death at present or before: 

Lemma B.5. Under Assumptions B.3 and B.4; 

(a) For all h > 0: y(*+'') =Y on{u;:Y<t}U {Jh>oW ■ ^^^^''^ < ^1- 

(b) For all {y, t, h) with y<t + h and h > : {to : y (*+'^) <y} = {uj:Y< 

y}- 

For a proof we refer to Lok (2001). 

If the outcome is survival, the support condition in Assumption B.l, say- 
ing that all FY{t+h)^z^ have the same bounded support [2/1,2/2], will not hold. 
The reason for this is as follows. Zt includes the covariate-measurements and 
treatment until time t. If covariates and treatment were measured at time t, 
it cannot be avoided to include in Zt whether or not a patient was alive at 
time t. Given that a patient is dead at time t and given his or her survival 
time, the distribution of this survival time cannot have the fixed support 
[2/1 12/2]) which is independent of t. Also, given that a patient is alive at time 
t, this is hardly ever the case; one often expects that t is the left limit of 
the support. Thus, in case the outcome is survival, the support condition 
for Theorem B.2 has to be slightly changed. 

Assumption B.6 (Support). There exists a finite number y2>T such 
that: 

(a) For every G and t with Y > t, all for /i > have support 

(b) For every u; G and t with Y > t, all FY{t+h)\-z^{y) for /i > have a 
continuous nonzero density fY{t+h)\z^{y) on ?/ ^ + h,y2]- 

(c) There exists a number e > such that, for all u G and t with Y >t, 
fywiztiy) > e for 2/ G [t,2/2]- 
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Next we look at the differentiability conditions in Assumption B.l. It 
does not seem reasonable to assume that FY(t+h)^z^{y) is continuously dif- 
ferentiable with respect to h and y on {h,y) G [0,00) x [t,y2] since, for 
y <t + h, FY(t+h)\-z^{y) = ^YlZt^y^ [Lemma B.5(b)]. Therefore, the derivative 
of FY(t+h)\z^{y) with respect to h is likely not to exist at y = t + h (and is 
equal to zero ioi y <t + h). Also, the derivative of FY(t+h)\z^{y) with respect 
to y may not exist at y = t + h, because of the different treatment before 
and after t + h. For survival outcomes, we replace the smoothness conditions 
of Assumption B.l by the following: 

Assumption B.7 (Smoothness). For every a; G $7: 

(a) If Z does not jump in (ti, t2) and Y > ti, the restriction of {y, t, h) — > 
^Y{t+'')|Zt(y) to {{y,t,h) £ [ti,y2] X [ti,t2)xR>o:y>t + h} is in {y,t,h). 

(b) The derivatives of Fy (t+h) ^-^^ (y) with respect to y and h are bounded 
by constants Ci and C2, respectively, for y £ [t + h,y2]. 

(^) Si]^YW\zSy) and J^U=o-PY(t+h)|Zt(y) ^^"^^ derivatives with respect to 
y which are bounded by constants Li and L2, respectively, for y G [t + h,y2]- 

The smoothness condition above concentrates on y >t + h. For y € [t,t + 
h) we can choose FY(t+h)\-z^{y) = -^y|Zt(2/) because of Lemma B.5(b). Because 
of Assumption 6.1 (consistency), Fy^-^^ has the same support as Fy{t)\z^, so 
Fy\z^ has support [t, 2/2] if 1^ > t [Assumption B.6(a)]. Assume the following 

Assumption B.8 {Smoothness). For ah w G and t with Y >t, FY^^{y) 
is continuous and strictly increasing on its support [t,y2]- 

Theorem B.9 (Mimicking counterfactual survival outcomes). Suppose 
that regularity Conditions B.6, B.7 and B.8 are satisfied. Then D{y,t;Zt) 
exists. Furthermore, for every lo £ Q, there exists exactly one continuous 
solution X(t) to X'{t) = D(X{t),t; Zt) with final condition X{t) = Y . If also 
Assumptions 6.1, B.3 and B.4 (consistency and no instantaneous treatment 
effect at time of death) are satisfied, then this X(t) has the same distribution 
as Y^^^ given Zt for all t G [0, r] . 

For a proof we refer to Lok (2001, 2004). 

APPENDIX C: MEASURABILITY ISSUES 

In most of this article we assume that the function which maps {X{t),Zt-) 
to X{to), with to < *i is a measurable function on M x Zt-, with the projec- 
tion fj-algebra on Zt- (see Section 2). Moreover, we sometimes assume that 
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the function which maps {X{tQ), Zt~) to X{t), with to <t, is a measurable 
function on M x Zt~. In this appendix we give sufficient conditions for this. 
If these two functions are measurable, a{Zt,X(t)) is a filtration, and, more- 
over, a{Zt,X{t)) is the same as a{Zt, X{0)) [see equation (15) in Section 9]. 

Lemma C.l. Suppose that D satisfies regularity Assumption 9.1 and 
that, for each uj Gil., Z jumps at most finitely many times. Then the function 
which maps {X{t), Z^-) to X{tQ), with t^ <t, is a measurable function from 
M X Zt- to R. Also, the function which maps {X{tQ), Zt-) to X(t), with 
to <t, is a measurable function from M x Zt- to M. 

For the proof of this result, which is quite technical since many results on 
differential equations are nonconstructive, we refer to Lok (2001). The proof 
uses the idea behind Euler's forward method to approximate the solution to 
the differential equation. 

APPENDIX D: A CONVERGENCE RESULT 

The following lemma is a worked-out case of theory from Van der Vaart 
(1998), Chapter 19. 

Lemma D.l. Let Xi,X2,--- be i.i.d. random variables with values in a 
measurable space X. Let {fe'.OG 0} be a collection of measurable functions 
from X to M'^ indexed by a subset C M"^ which contains an open neigh- 
borhood Qq of 9q. Suppose that 9 — > /^(x) is continuous on Gq for every 
X £ X . Suppose also that there exists a measurable function F on X such 
that \\fg\\ < F for every 9 £ Qq and such that EF{Xi) exists. Then if 9 
converges in probability to 9q, 

Pnfs^Efe,{Xi), 
where Pn indicates the empirical distribution o/ Xi, X2, . . . , X„. 

Proof. Notice that 

\\Pnfe - Efe,{X,)\\ < \\Pnf^ - Ef^{X,)\\ + \\Ef^{X,) - Efe,{X,)\\. 

We show that both terms converge to zero in probability. Choose Gi C ©0 
compact and such that it contains an open neighborhood of ^o- Example 19.8 
from Van der Vaart (1998) implies that, under the conditions above, 

sup ||P„/e-S/e(Xi)|HO a.s. 

' p 

Since 9^9q and 0i contain an open neighborhood of ^O) this implies that 
the first term converges in probability to zero. For the second term, notice 
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that, on ©0) G ^ fe{x) is continuous in 6 and that each of the components 

of fe is bounded by the integrable function F, so that Lebesgue's dominated 

convergence theorem impHes that Efg{Xi) is continuous in 6 on Qq. Thus, 
" p 

since 6^9q, also the second term converges in probabihty to zero. □ 
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